Skip to content

RFC: Environment Wrap Actions#130

Open
leongdl wants to merge 5 commits into
OpenJobDescription:mainlinefrom
leongdl:taskRunWrap
Open

RFC: Environment Wrap Actions#130
leongdl wants to merge 5 commits into
OpenJobDescription:mainlinefrom
leongdl:taskRunWrap

Conversation

@leongdl
Copy link
Copy Markdown

@leongdl leongdl commented Apr 17, 2026

PR for RFC

Tracking Issue: #132

This is a request for comments about RFC 0008 — Environment Wrap Actions, which
extends <Environment> with three new session actions — onWrapEnter,
onWrapTaskRun, and onWrapExit — that let an environment template intercept and
wrap the lifecycle actions of inner environments and tasks. A companion opt-out,
runOnHost: true on <Action>, lets individual actions bypass wrapping when they
must run on the host (credential fetching, mount setup, cleanup that must always
run).

The primary motivation is portable container support: a Docker or Apptainer
environment template can start a container in onEnter, route every inner
environment's onEnter/onExit and every task's onRun into the container via the
three wrap hooks, and stop the container in onExit. Job templates and inner
environments remain portable across Conda, Rez, Docker, and Apptainer. The design
also generalizes to remote execution, session-wide instrumentation, and privilege
isolation.

This RFC is gated by the new WRAP_ACTIONS extension and depends on RFC 0002
(Model Extensions), RFC 0005 (Expression Language), and RFC 0006 (Expression
Function Library — for repr_sh() and friends).


By submitting this pull request, I confirm that you can use, modify, copy, and
redistribute this contribution, under the terms of your choice.

@leongdl leongdl changed the title [WIP] RFC: Environment On Task run wrap RFC: Environment On Task run wrap May 9, 2026
@leongdl leongdl changed the title RFC: Environment On Task run wrap RFC: Environment Wrap Actions May 9, 2026
@leongdl leongdl marked this pull request as ready for review May 9, 2026 21:46
@leongdl leongdl requested a review from a team as a code owner May 9, 2026 21:46
Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
intercept and wrap the lifecycle actions of *inner* environments and tasks. The runtime
supplies each wrap action with the wrapped action's command, args, timeout, cancelation
method, and environment variables as template variables. A companion opt-out,
`runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this runOnHost also be controlled by the wrapping environment? Then an author of an external environment could have more control over this. If we leave this as is we might want to find a different name, because the distinguishing factor is that it skips the wrapping - both wrapped and non-wrapped actions are running on the host.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, unless the wrapping environment inspects and has logic for every line in the wrapped action?

Something like exporting environment variables on the host would be one case that is hard to detect

(And agree on the naming is hard)

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
`runOnHost: true` on `<Action>`, lets individual actions bypass wrapping when they
must run on the host (credential fetching, mount setup, cleanup that must always run).

The primary motivation is container support: a Docker or Apptainer environment template
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the primary motivation, but it's worth mentioning that we want this to be a general composable feature to the extent possible. If we can avoid all references to containers in the structure of the extension (vs the docs and examples which should have many), I think that will be useful.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, and makes sense.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
onWrapTaskRun:
command: "bash"
args: ["{{Env.File.WrapTaskRun}}"]
timeout: "{{Task.Timeout}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between Env.Wrapped.Timeout and Task.Timeout feels unnecessary. Switching to either Task.Wrapped.Timeout, or Env.Timeout would make it consistent. Since the timeout is on the action itself, maybe it could be Task.Action.Timeout and Env.Action.Timeout?

We should also think about the general forwarding mechanism. How will we forward #118 when we make that? Can there be a general mechanism, or will we give the template author responsibility to plumb through the features?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I did not see 118, let me read and incorporate it.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
### Apptainer environment template

The same pattern applies to Apptainer (daemonless, each wrap hook invokes
`apptainer exec` directly rather than exec'ing into a running container). See
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it run as a daemon? This seems necessary for many use cases.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think daemonless was meaning the "apptainer" daemon, like the persistent "docker daemon", not necessarily referring to the container as a daemon. Maybe a framing difference here.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
[Appendix A: Apptainer environment template](#appendix-a-apptainer-environment-template)
for the full example.

### Job template that works with any environment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be able to point to our existing samples. Maybe add a technical requirement that the way we've been writing jobs already must work with or without wrapping to the extent we can do that.

I guess the purpose of this is to show the runOnHost option?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - exactly. The RunOnHost was an escape hatch.

Lets chat offline how we can offer the escape hatch. Either within the hook, or as an explicit declaration.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
often where performance problems hide.

5. **Privilege isolation.** Run inner actions as a different user or with reduced
capabilities by wrapping the command with `sudo -u`, `unshare`, or a jailed shell.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the inner environments and task runs can select runOnHost then this use case is weaker.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the runOnHost escape hatch would change this. But nonetheless, examples such as running a container would be able to use the sudo -u.

I'll re-frame this as a addendum use case, not necessarily an argument.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
4. **Cross-OS wrapping.** The same-path bind-mount requirement assumes the host and
the wrapped execution context share path-separator conventions (Linux host with
Linux container, or Windows with Windows). Cross-OS wrapping (e.g., a Windows host
launching a Linux container) is not supported by this RFC.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The responsibility of the RFC would be to provide all information necessary to do this Cross-OS wrapping, not to assist in any way. I think it does that?

Copy link
Copy Markdown
Author

@leongdl leongdl May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, let me delete it.

Along the way I was thinking - why wrap windows containers in linux, and vice versa. Although linux containers on windows absolutely work but is terrible thorugh WSL.

Was originall thinking to cut some scope or defer this family of cases for later.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated

### Security: container isolation boundaries

Each container instance should be scoped to a single security boundary. In the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec isn't about containers, it's about the ability to perform action wrapping reliably. I think the scheduler doesn't have any responsibility about "container," its responsibility is purely about providing all the specified arguments to the wrap* actions. The RFC should focus exclusively on the wrapping part where it is about the OpenJD behavior, and talk about containers in example contexts that are about containers.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - I'll do a pass and move this into my separate design doc instead.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
2. The container does not run with elevated privileges (`--privileged`) unless explicitly
configured by the environment template author.
3. Bind mounts are scoped to the session working directory and explicitly declared paths,
not the entire host filesystem.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are scheduler responsibilities - they are the responsibility of the environment implementer writing support for containers. The spec should ensure that the environment has sufficient capabilities to do what it needs to do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I think here, AI here thought "deadline" overall as the scheduler, I'll delete these.

Comment thread rfcs/0008-environment-wrap-task-run.md Outdated
```

1. *onEnter* — The action to run when entering the environment.
2. *onWrapEnter* — If provided, this action is run instead of the `onEnter` action of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an all or nothing rule. Either all onWrap* must be included, or none of them. This would help against accidentally implementing just part of the wrapping, then getting hard to debug results because not everything is running where expected.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'll add that into the spec.

While the template yaml / json is slightly larger it is much harder to do the wrong thing.

Signed-off-by: David Leong <116610336+leongdl@users.noreply.github.com>
@leongdl
Copy link
Copy Markdown
Author

leongdl commented May 15, 2026

Update for RFC 0008 reviewers

The RFC has been substantially revised and synthesized into the 2023‑09 spec and wiki. Highlights since the last review:

Scope expanded from one hook to three. The proposal now wraps the full environment lifecycle, not just task run:

  • onWrapEnter — replaces an inner environment's onEnter
  • onWrapTaskRun — replaces a task's onRun (the original proposal)
  • onWrapExit — replaces an inner environment's onExit

The file was renamed accordingly: 0008-environment-wrap-task-run.md0008-environment-wrap-actions.md. The wrapping environment's own onEnter/onExit are never wrapped — they always run on the host.

New hard constraints, enforced before session entry:

  1. All‑or‑nothing — an environment must define all three wrap hooks or none.
  2. Single‑layer — at most one wrapping environment per session stack.
  3. EXPR prerequisite — templates listing WRAP_ACTIONS must also list EXPR.

New template‑variable scopes:

  • Env.Wrapped.* — available only inside onWrapEnter / onWrapExit
  • Task.Action.* — available only inside onWrapTaskRun

Schedulers must reject templates that reference these outside their allowed hooks.

Timeout semantics clarified. A wrap hook's timeout bounds the host‑side script (infrastructure side); the wrapped action's timeout is exposed to the script via Task.Action.Timeout / Env.Wrapped.Action.Timeout (workload side, 0 when unset). onWrapExit defaults to 300s like onExit; the other two have no default.

Exit‑code 125 reserved for infrastructure failure (Docker daemon unreachable, image pull failed, SSH refused, etc.). Other non‑zero exits are workload failures. Schedulers may use this to drive retry / worker‑eviction / alerting.

Stdout macro handling specified. Schedulers scan the wrap script's stdout for OpenJD macros (openjd_env, openjd_unset_env, …), not the wrapped process's. Wrap scripts must forward the wrapped process's stdout/stderr verbatim, so macros emitted by the wrapped process are recognized identically.

Synthesized into the normative docs:

  • wiki/2023-09-Template-Schemas.mdWRAP_ACTIONS registered in the extensions list; hooks, scopes, timeout table, and exit‑125 convention added.
  • wiki/How-Jobs-Are-Run.md — lifecycle interception described and the stdout‑scanning rule documented.

Re‑review focus suggestions: the all‑or‑nothing + single‑layer rules, the Env.Wrapped.* / Task.Action.* scope enforcement, and whether 125 is the right sentinel for infrastructure failure.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
onWrapEnter:
command: bash
args: ["{{Env.File.Wrap}}"]
timeout: "{{Env.Wrapped.Action.Timeout}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about the names for the wrapped action variables, these two properties would be useful:

  1. They have exactly the same name across all of the onWrap* actions.
  2. They're compatible with a future introspection where the action can reference itself, probably via Action.*.

Based on that, instead of Env.Wrapped.Action.* and Task.Action.*, the names WrappedAction.* is a good candidate to use uniformly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll refactor to just WrappedAction.*

#!/usr/bin/env bash
set -euo pipefail
DOCKER_CONTAINER_ID=$(docker container run --rm --detach \
--mount 'type=bind,src={{Session.WorkingDirectory}},dst={{Session.WorkingDirectory}}' \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'type=bind,src={{Session.WorkingDirectory}},dst={{Session.WorkingDirectory}}'

For this part we'll want it to be inside of a repr_sh instead of in single quotes. Does that need a let binding or is there a different good way to represent it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I excluded f-strings from the EXPR extension, but maybe that's how it could be more pleasant to write?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add f-format strings later since it is a programming improvements.
After going through more examples, I actually think let bindings is a great way, since we can print them out, not recompute the variable etc. This can be a great programmer guide and suggestion.

Will update with repr_sh, good suggestion. I guess it is useful mainly on CMF, since SMF we use a simpler directory.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
{{ repr_sh(Env.Wrapped.Action.Command) }} \
{{ repr_sh(Env.Wrapped.Action.Args) }}

- name: WrapTask
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Wrap and WrapTask could be the same script file if the wrapped action variable names are the same for this example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - given the comment above, I'll align the variables to WrappedAction then this example can be shorter. Its still flexible to have specific scripts per hook, so usability wise it is good.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

### Execution order with a wrapping environment

For a session with a wrapping Docker environment `A` and a non-wrapping inner step
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning here that the wrapping environment doesn't have to be outermost? B could be the wrapping environment, then A runs like normal, B wraps the tasks' onRun.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's definitely an option. As long as it meets our tenet of 1 wrapped action at most in the whole structure.

I'll add a 1 sentence to keep it short, and a conformance test as example.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
```

On a queue with no wrapping environment, `A` does not exist and every action runs
on the host. The job template and the inner environments are unchanged.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "on the host," maybe say "normally" since the wrapped actions typically run on the host as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I have applied this change to remove "on the host" generally throughout this document.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
A.onExit → HOST
```

On a queue with no wrapping environment, `A` does not exist and every action runs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Job Description doesn't define what a queue is. Deadline Cloud queue environments are based on the environment template specification. Should reword to ensure only the OpenJD terms are being used.

Copy link
Copy Markdown
Author

@leongdl leongdl May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point :) Now a steering rule in my own skills!

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

> Job templates should be portable in a way to run them, unmodified, with either a
> Conda, Rez, Docker, or Apptainer environment template that provides the software
> environment to run in.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is written it looks like a quote from the tenets, but it's an example application of the portability tenet into this application. Because of that I would remove the > quoting from it to make it clear this text is primary not a quote.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Will remove the >

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
context (environment variables, timeout, enter/task/exit distinction) and
composes through the environment-template model.

### Docker exec pattern in OpenJD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to delete this section, bash-in-docker was already referenced above.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
under `.Action.` maps to a named field of the wrapped `<Action>` object. This has
three concrete benefits:

1. Symmetry between the task and environment branches (both use `.Action.`),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't copy/paste them though, I think matching the full names exactly is better.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - will rename.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
straightforward to iterate over in a list comprehension:

```yaml
{{ repr_sh(flatten([['-e', e] for e in Task.Action.Environment])) }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if e.split('=', 1) works as expected to be able to separately access the key and value?


## Security Considerations

### Command injection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would frame this more that the template author is responsible to ensure the template is correct about quoting all input, and our job in the spec is to provide primitives that enable them to do so. repr_sh is key to that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true - the spec can only give the best practice. Our examples is the demonstrator, but the user is the one is responsible to re-use our examples. Will do!

would fit directly. An `onRun` near the 256 KB macOS `ARG_MAX`, for example,
may fail only when wrapped.

Implementations SHOULD validate command-line length before exec and produce
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who does this affect? It's unclear whether this is about the OpenJD implementation or the template?

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
(the wrap environment) from the job-logic layer (the job template), rather than
annotating individual processes with a runtime identifier.

### Universal Wrapper pattern
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's describing the idea of this spec, not prior art? I'd suggest to remove this section.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
exec'ing tasks into it. This RFC generalizes that pattern to work at any
environment level and extends it to cover inner-environment setup and teardown.

### Conda and Rez queue environments
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't seem to describe precedent. Both conda and rez have run commands that could be used in a wrapping environment though, maybe that's worth mentioning here. Otherwise I'd delete this section.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
explicit wrap-hook approach keeps the specification general and the user in
control.

### Global config toggle for runtime selection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe merge this with the above section to tighten up the RFC (We're at line 923! I think it can be shorter).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, too much details :)

> 3. *EXPR prerequisite.* A template that lists `WRAP_ACTIONS` in `extensions:` must also list `EXPR`.
> Schedulers must reject templates that list `WRAP_ACTIONS` without also listing `EXPR`.
> 4. The wrapping environment's own `onEnter` and `onExit` are never wrapped; they always run on the
> host.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "run normally" instead of "run on the host." You'll want to do a sweep for this one throughout.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that makes sense.

Comment thread wiki/2023-09-Template-Schemas.md Outdated
The host uses the return code of the **command** run to determine success or failure of the Action. A zero exit code
indicates success, and any non-zero exit code indicates failure. A timeout also indicates failure.

When the WRAP_ACTIONS extension is in use, wrap scripts should signal infrastructure failure (Docker
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have this kind of signal defined elsewhere? This doesn't feel specific to wrap actions, I think we'd either exclude this, or define this as a general thing that all actions can do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I was thinking about timeouts and how to handle docker general failures such as cannot start shared device, and found this 125 behavior. We can definitely cut it for now, and if we have a specific need to distinguish a wrapper failure vs task failure we can introduce this.

(Seemed like an interesting add, but can also be a single sentence for the future to consider.)

Comment thread wiki/How-Jobs-Are-Run.md
in the **Session** until the **Environment** that emitted it is exited.

When the WRAP_ACTIONS extension is in use, schedulers must scan the stdout of the wrap script (`onWrapEnter`,
`onWrapTaskRun`, or `onWrapExit`) for these macros, not the stdout of the wrapped process. Wrap scripts must
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify this? The scheduler has no visibility to the inside process, there might be multiple or no inside processes at all. Because of that I don't think there are alternatives to what's being described.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the word scheduler is confused here, how about I just say "forward the macros" so the wrapped process's messages are still emitted by the wrapper so we do not lose the content.?


**Available in `onWrapEnter` and `onWrapExit`:**

| Variable | Type | Description |
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have decided to unify the variables as per above.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
- Variables set by the wrapping environment's own `onEnter` through any
channel other than `openjd_env` stdout macros.

Wrap scripts that need any of these variables in the wrapped execution
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to shorten this paragraph and section, it is defining only forwarding semantics.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
channel other than `openjd_env` stdout macros.

Wrap scripts that need any of these variables in the wrapped execution
context MUST forward them explicitly. For Docker, this typically means
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker is "example", no need to be so wordy. The RFC should be agnostic of docker.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
`Command` and `Args` may reference host filesystem paths, most commonly
embedded files (`{{Env.File.X}}` resolves to a host path), session working
directory entries, or job-attachment paths. The wrap environment MUST ensure
that the wrapped process can resolve every such path, typically by
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can shorten this - docker and containers is an example to use bind mount and maintain path equivalence.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
+ - An inner Environment's onExit is replaced by the wrapping Environment's onWrapExit.
+
+ The wrapping Environment's own onEnter and onExit are never wrapped; they always run
+ on the host. If more than one Environment in the session stack defines any wrap hook,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, replace "on the host" as per above.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

### Stdout forwarding and macro propagation

The wrap script is the scheduler's direct child; the wrapped process is a
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check the use of "scheduler" here - does the session runner make more sense?

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

Wrap scripts MUST forward the wrapped process's stdout and stderr to their own
stdout and stderr verbatim, without buffering, filtering, or transformation.
This is the default behavior of `docker container exec`, `apptainer exec`, and
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, keep it shorter

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

#### Macro propagation (contract)

Schedulers MUST scan the wrap script's stdout for OpenJD stdout macros: the
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOt the scheduler - this should be transparently propagated via logging as per previous section. The OJD session runtime detects the openjd_ macros and applies them normally.

Given the wrapper, logs are pass through, and wrapped task's macro usage are unchanged and applied to the job as per the spec.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
`openjd_env`, `openjd_fail`, `openjd_progress`, and `openjd_status` macros
defined in [How Jobs Are Run](https://github.com/OpenJobDescription/openjd-specifications/wiki/How-Jobs-Are-Run),
along with any future macros added by later specifications. Schedulers MUST
NOT scan the grand-child's stdout directly. This preserves the existing
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase or remove this - the word scheduler and scan here does is not correct.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
- Variables emitted by wrapped actions (an inner environment's `onEnter`
running via `onWrapEnter`, or earlier wrapped task `onRun` invocations).

This is the chain that makes wrap composition useful: an inner Conda
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example: an inner Conda

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
[Basic Example](#basic-example) uses this pattern to publish the container
ID for use in subsequent wrap hooks.

#### Alternatives considered
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the alternative considered for openjd_env propagation. Move it to out of scope above and propose current semantics of the spec apply and pass through the wrap. (Or consider the use of transitive?)

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

1. **Workload failure**: the wrapped action itself exited non-zero. The
job logic is at fault.
2. **Infrastructure failure**: the wrap script could not deliver the
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion, we combine infrastructure failure and workfload failure. Return non-zero and keep it simple. The exit code 125 can be moved to a future consideration if separating the failure mode is important.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
Finer-grained classification (network vs. daemon vs. image-pull) is deferred
to [Future Work](#future-work).

#### Scheduler policy
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this section #### scheduler policy since it is superfluous.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated
stops the container, closes the remote session, releases the bind-mount, or
otherwise releases the host resources allocated by `onEnter`.

**Summary of run-on-failure guarantees:**
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lead with this table and keep each description above shorter. Can make it easier to consume as a reader.

Comment thread rfcs/0008-environment-wrap-actions.md Outdated

Any of the following idioms satisfies the requirement:

- Shell wrap scripts SHOULD `trap` SIGTERM and forward it, or `exec` the
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this section - signal and error code propagation is the key and focus The inner task or enter exit fails, it is propagated by the wrap script. Keep 1 example with a sig term, and 1 with docker. Make this shorter.

the wrap script to the wrapped process is normative and specified in
[Cancelation behavior](#cancelation-behavior).

## Design Choice Rationale
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an appendix. Its good for discussion, maybe can be deleted in the final version.

leongdl added 2 commits May 16, 2026 18:17
…d conformance suite

Address PR OpenJobDescription#130 review feedback. Collapse Env.Wrapped.Action.* and
Task.Action.* into a single WrappedAction.* namespace shared by all three
wrap hooks; this lets a single helper script be reused across hooks and
composes cleanly with future Action.* self-introspection. Drop the
duplicate Out-of-scope section, the inline timeout-rationale comment in
the Basic Example, and Docker-specific examples in the Host-environment
section. Replace "queue"/"on the host" wording per OpenJD terminology.
Note that the wrapping environment is not required to be the outermost.
Collapse near-identical wrap scripts in the appendix templates (Docker
4→2, Apptainer 3→1, SSH 2→1, Profiling 2→1); EchoWrap keeps three to
demonstrate per-hook customization.

Add the WRAP_ACTIONS conformance suite: 21 valid .test.yaml + 7
.invalid fixtures covering interception of all three hooks, the
all-or-nothing rule, the single-wrap-layer rule, the variable-scope
rule, extension gating, openjd_env propagation, and the shell-quoting
security cases from the implementation considerations.
command: sh
args:
- "-c"
- "{{ repr_sh(WrappedAction.Command) }} {{ repr_sh(WrappedAction.Args) }}"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if this should "fail" - since the enter is not wrapped.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks valid to me because that's syntax check, not a runtime check that the template it's applied against actually has an onEnter.

This RFC is additive and gated by the `WRAP_ACTIONS` extension name declared under
RFC 0002. Specifically:

- Schedulers that do not implement `WRAP_ACTIONS` MUST reject templates that list
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the use of the word "schedulers", is there a specific verb we call openJD runtimes?

wrap hook's `<Cancelation>` policy (see
[Cancelation behavior](#cancelation-behavior)).

## Design Choice Rationale
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we would like to keep this section - for brevity.

# onEnter, onWrapEnter, onWrapTaskRun, onWrapExit, or onExit must be defined.
specificationVersion: environment-2023-09
extensions:
- WRAP_ACTIONS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should already fail because EXPR is missing, so it doesn't test what the comment says.

args: ["entered"]
onWrapEnter:
command: echo
args: ["wrap-enter", "{{Env.Wrapped.Name}}"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with WrappedEnv.Name (and likely WrappedStep.Name inside onWrapTaskRun is probably more consistent with the WrappedAction convension.

- "{{ repr_sh(WrappedAction.Command) }} {{ repr_sh(WrappedAction.Args) }}"

runOn:
- posix
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't posix-specific, better to run on all OS.

script:
actions:
onWrapEnter:
command: sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion change this to bash as that's what we generally have on Windows as well.

command: echo
args: ["ORIGINAL_TASK"]

environments:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to double check: Make sure we're testing with wrapping environment inside the job (both as job environments and step environments) as well. I'm not very far in so good chance they're later!

lifecycle actions of inner environments and tasks with before/after logic
provided by the outer environment.

### Workflow DSLs with container abstraction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we skip this part, it's not actually about wrapping.

(the wrap environment) from the job-logic layer (the job template), rather than
annotating individual processes with a runtime identifier.

### Conda and Rez run commands
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could include Docker here too, its command is similar.


## Rejected Ideas

### Extending `<Action>` with a `wrapper` field
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider dropping this one.

Stacking multiple wrap environments (for example, a profiling wrapper outside a
container wrapper) is valuable but adds implementation complexity. Deferred.

### Wrap-aware cancelation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't too bad to implement, maybe we want to add this in?

from the wrap mechanic. A separate RFC for environment health monitoring is the
appropriate venue.

### Cross-OS wrapping
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider dropping this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants